Sumset and Inverse Sumset Inequalities 
for Differential Entropy and Mutual Information 

Ioannis Kontoyiannis, Fellow, IEEE *^ Mokshay Madiman, Member, IEEE 

March 6, 2013 



Abstract 

The sumset and inverse sumset theories of Freiman, Pliinnecke and Ruzsa, give bounds 
connecting the cardinality of the sumset A + B = {a + b ; a e A, b e B} of two discrete sets 
A, B, to the cardinalities (or the finer structure) of the original sets A, B. For example, the 
sum- difference bound of Ruzsa states that, \A + B\ \A\ \B\ < \A — B\ 3 , where the difference 
set A — B — {a — b ; a e A, b e B}. Interpreting the differential entropy h(X) of a 
continuous random variable X as (the logarithm of) the size of the effective support of X, 
the main contribution of this paper is a scries of natural information-theoretic analogs for 
these results. For example, the Ruzsa sum-diffcrcncc bound becomes the new inequality, 
h(X + Y) + h(X) + h(Y) < 3h(X — Y), for any pair of independent continuous random 
variables X and Y. Our results include differential-entropy versions of Ruzsa's triangle 
inequality, the Pliinnecke-Ruzsa inequality, and the Balog-Szemeredi-Gowers lemma. Also 
we give a differential entropy version of the Freiman-Grecn-Ruzsa inverse-sumset theorem, 
which can be seen as a quantitative converse to the entropy power inequality. Versions of most 
of these results for the discrete entropy H(X) were recently proved by Tao, relying heavily on 
a strong, functional form of the submodularity property of H(X). Since differential entropy 
is not functionally submodular, in the continuous case many of the corresponding discrete 
proofs fail, in many cases requiring substantially new proof strategies. We find that the basic 
property that naturally replaces the discrete functional submodularity, is the data processing 
property of mutual information. 
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1 Introduction 



1.1 Motivation 

Roughly speaking, the field of additive combinatorics provides tools that allow us to count 
the number of occurrences of particular additive structures in specific subsets of a discrete 
group; see [17] for a broad introduction. The prototypical example is the study of the existence 
of arithmetic progressions within specific sets of integers - as opposed to the multiplicative 
structure that underlies prime factorization and much of classical combinatorics and number 
theory. There have been several major developments and a lot of high-profile mathematical 
activity in connection with additive combinatorics in recent years, perhaps the most famous 
example being the celebrated Green- Tao theorem on the existence of arbitrarily long arithmetic 
progressions within the set of prime numbers. 

An important collection of tools in additive combinatorics is a variety of sumset inequalities, 
the so-called Pliinnecke-Ruzsa sumset theory; see [17] for details. The sumset A + B of two 
discrete sets A and B is defined as, A + B = {a + b : a E A, b E B}, and a sumset inequality 
is an inequality connecting the cardinality |A + -B| of A + B with the cardinalities |^4|, \B\ of A 
and B, respectively. For example, there are the obvious bounds, 

max{|A|,|£|} < \A + B\ < \A\ \B\, (1) 

as well as much more subtle results, like the Ruzsa triangle inequality [13], 

„_ C|S |.-BMB-C| (2) 

or the sum-difference bound [13], 

I A _ R|3 

all of which hold for arbitrary subsets A, B, C of the integers or any other discrete abelian group, 
and where the difference set A — B is defined as, A — B = {a — b : a E A, b E B}. 

In the converse direction, the Freiman-Ruzsa inverse sumset theory provides information 
about sets A for which \A + A\ is close to being as small as possible; see Section 4 for a brief 
discussion or the text [17] for details. 

In this context, recall that Shannon's asymptotic equipartition property (AEP) [3] says that 
the entropy H(X) of a discrete random variable X can be thought of as the logarithm of the 
effective cardinality of the alphabet of X. This suggests a correspondence between bounds for 
the cardinalities of sumsets, e.g., \A + B\, and corresponding bounds for the entropy of sums 
of independent discrete random variables, e.g., H(X + Y). First identified by Ruzsa [14], this 
connection has also been explored in the last few years in different directions by, among others, 
Tao and Vu [18], Lapidoth and Pete [8], Madiman and Kontoyiannis [10], and Madiman, Marcus 
and Tetali [11]; additional pointers to the relevant literature are given below. 

This connection was developed most extensively by Tao in [16]. The main idea is to replace 
sets by (independent, discrete) random variables, and then replace the log-cardinality, log \A\, of 
each set A by the (discrete, Shannon) entropy of the corresponding random variable (where log 
denotes the natural logarithm log e ). Thus, for independent discrete random variables X, Y, Z, 
the simple bounds (1) become, 

m&x{H(X), H(Y)} < H(X + Y)< H(X) + H(Y), 
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which is a trivial exercise in manipulating entropy [3]. On the other hand, again for independent 
discrete random variables, Ruzsa's influential bounds (2) and (3) become, respectively, 

H(X - Z) + H(Y) < H(X -Y) + H(Y - Z) 
and H(X + Y)+H{X) + H{Y) < 3H(X -Y), 

which are nontrivial facts proved in [16]. 

Our main motivation is to examine the extent to which this analogy can be carried further: 
According to the AEP [3], the differential entropy h(X) of a continuous random variable X can 
be thought of as the logarithm of the "size of the effective support" of X. In this work we state 
and prove natural "differential entropy analogs" of various sumset and inverse-sumset bounds, 
many of which were proved for the discrete Shannon entropy in the recent work of Tao [16] and 
in earlier papers by Kaimonovich and Vershik [7], Tao and Vu [18], Madiman [9], Ruzsa [14], 
and Madiman, Marcus and Tetali [11]. 

Of particular interest in motivating these results is the fact that the main technical ingredient 
in the proofs of many of the corresponding discrete bounds was a strong, functional form of the 
submodularity property of the discrete Shannon entropy; see Section 2 for details. The fact that 
differential entropy is not functionally submodular was the source of the main difficulty as well 
as the main interest for the present development. 

1.2 Outline of main results 

In Section 2, after briefly reviewing some necessary background and basic definitions, we discuss 
the functional submodularity of the discrete entropy, and explain how it fails for differential 
entropy. 

Section 3 contains most of our main results, namely, a series of natural differential entropy 
analogs of the sumset bounds in [ i] and in the earlier papers mentioned above. In Theorem 3.1 
we prove the following version of the Ruzsa triangle inequality: If X, Y, Z are independent, then: 

h(X -Z)< h{X — Y) + h(Y - Z) - h(Y). 

In Theorem 3.5, we prove the doubling- difference inequality: If X\,X 2 are independent and 
identically distributed (i.i.d.), then: 

1 h(X 1 +X 2 )-h(X 1 ) 

2 " h(Xi - X 2 ) - h(X{) ~ ' 

More generally, when X±,X 2 are independent but not identically distributed, the following sum- 
difference inequality holds, given in Theorem 3.7: If X±,X 2 are independent, then: 

h(X 1 + X 2 ) < 3h(X 1 - X 2 ) - h(Xi) - h(X 2 ). 

A version of the Pliinnecke- Ruzsa inequality for differential entropy is given in Theorem 3.11: 
If X, Y±, Y 2 , . . . , Y n are independent and there are constants K\, K 2 , ■ ■ ■ , K n such that, 

h(X + Yi) < h(X) + log K { , for each i, 

then 

h(X + Y 1 + Y 2 + • • • + Y n ) < h(X) + log^i^ 2 ■■■K n . 
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An application of this result gives the iterated sum bound of Theorem 3.12. 

Next we prove a Balog-Szemeredi-Gowers lemma for differential entropy. It says that, if A, Y 
are weakly dependent and A + Y has small entropy, then there exist conditionally independent 
versions of X, Y that have almost the same entropy, and whose independent sum still has small 
entropy. Specifically, in Theorem 3.14 we prove the following: Suppose X, Y satisfy I(X;Y) < 
log A', for some K > 1, and suppose also that, 



Let Xi, X2 be conditionally independent versions of X given Y, and let Y' be a conditionally 
independent version of Y, given X2 and Y. Then: 



The main interest in the proofs of the above results is that, in most cases, the corresponding 
discrete-entropy proofs do not generalize in a straightforward manner. There, the main technical 
tool used is a strong, functional submodularity property of H(X), which does not hold for 
differential entropy. Moreover, for several results it is the overall proof structure that does not 
carry over to the continuous case; not only the method, but some of the important intermediate 
steps fail to hold for differential entropy, requiring substantially new proof strategies. 

The main technical ingredient in our proofs is the data processing property of mutual infor- 
mation. Indeed, most of the bounds in Section 3 can be equivalently stated in terms of mutual 
information instead of differential entropy. And since data processing is universal in that it holds 
regardless of the space in which the relevant random variables take values, these proofs offer 
alternative derivations for the discrete counterparts of the results. The earlier discrete versions 
are discussed in Section 3.4, where we also describe the entropy version of the Ruzsa covering 
lemma and the fact that its obvious generalization fails in continuous case. 

In Section 4 we give a version of the Freiman- Green- Ruzsa inverse-sumset theorem for differ- 
ential entropy. Roughly speaking, Tao in [16] proves that, if the entropy H(X + X') of the sum 
of two i.i.d. copies of a discrete random variable X is close to H(X), then X is approximately 
uniformly distributed on a generalized arithmetic progression. 

In the continuous case, the entropy power inequality [3] says that, if X,X' are i.i.d., then, 



with equality if and only if X is Gaussian. In Theorem 4.1 we state and prove a quantitative 
converse to this statement: Under certain regularity conditions on the density of A, we show 
that if h(X + A') is not much larger than h(X), then A will necessarily be approximately 
Gaussian, in that the relative entropy between its density and that of a Gaussian with the same 
mean and variance will be correspondingly small. 

Finally we note that, in view of the fact that additive noise is one of the most common 
modeling assumptions in Shannon theory, it is natural to expect that, likely, some of the bounds 
developed here may have applications in core information-theoretic problems. Preliminary con- 
nections in this direction can be found in the recent work of Cohen and Zamir [2], Etkin and 
Ordentlich [4], and Wu, Shamai and Verdu [19]. 



h(X + Y)< hi{X) + h(Y) + log K. 



h(X 2 \X 1 ,Y) > h(X)-\ogK 
h(Y'\X u Y) > h(Y)- log K 

h(X 2 +Y'\X l ,Y) < l -h{X) + hi{Y) + 7\ogK. 
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2 Elementary Bounds and Preliminaries 

The entropy of a discrete random variable X with probability mass function P on the alphabet 
A is H(X) = E[-\ogP(X)] = J2x€A P ( x ) l °s(^/P(x))- Throughout the paper, log denotes 
the natural logarithm log e , and the support (or alphabet) of any discrete random variable 
X is assumed to be a (finite or countably infinite) subset of the real line or of an arbitrary 
discrete abelian group. Perhaps the simplest bound on the entropy H(X + Y) of the sum of two 
independent random variables X, Y is, 

H(X + Y)> max{H(X), H(Y)}, 

which easily follows from elementary properties [3], 

H(X) + H(Y) = H(X,Y) = H(Y,X + Y) = H(X + Y) + H(Y\X + Y) 

< H(X + Y) + H(Y), (4) 

and similarly with the roles of X and Y interchanged. The first and third equalities follow 
from the chain rule and independence, the second equality follows from the "data processing" 
property that H(F(Z)) = H(Z) if F is a one-to-one function, and the inequality follows from 
the fact that conditioning reduces entropy. 

A similar argument using the nonnegativity of conditional entropy [3], 

H(X) + H(Y) = H(Y } X + Y) = H(X + Y) + H(Y\X + Y)> H(X + Y), 

gives the upper bound, 

H{X + Y) < H(X) + H(Y). (5) 

The starting point of our development is the recent work of Tao [16], where a series of sumset 
bounds are established for H(X), beginning with the elementary inequalities (4) and (5). The 
arguments in [16] are largely based on the following important observation [16] [11]: 

Lemma 2.1 (Functional submodularity of Shannon entropy) If Xq = F(Xi) = G(X 2 ) 
and X\2 = R(X±, X2), then: 

H(X 12 ) + H(X ) < H{X X ) + H(X 2 ). 

Proof. By data processing for mutual information and entropy, H(X±) + FKJC2) — H{X\2) > 
H(X 1 ) + H(X 2 ) - H{X U X 2 ) = I(X V ,X 2 ) > I{X ;X ) = H(X ). □ 

One of our main goals in this work is to examine the extent to which the bounds in [16] and 
in earlier work extend to the continuous case. The differential entropy of a continuous random 
variable X with density / on M is h{X) = E[-\ogf(X)\ = f(x) log(l//(z)) dx. The 
differential entropy of any finite-dimensional, continuous random vector X = (Xi,X 2 , ■ ■ ■ ,X n ) 
is defined analogously, in terms of the joint density of the Xj. In order to avoid uninteresting 
technicalities, we assume throughout that the differential entropies in the statements of all our 
results exist and are finite. 

The first important difference between H(X) and h(X) is that the differential entropy of a 
one-to-one function of X is typically different from that of X itself, even for linear functions [3]: 
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For any continuous random vector X and any nonsingular matrix T, h{TX) = /i(X) + log |det(T)|, 
which is different from h(X) unless T has determinant equal to ±1. 

The upper bound in (5) also fails in general for independent continuous X, Y: Take, e.g., X, Y 
to be independent Gaussians, one with variance a 2 > 2ire and the other with variance 1/cr 2 . And 
the functional submodularity Lemma 2.1 similarly fails for differential entropy. For example, 
taking X\ = X2 an arbitrary continuous random variable with finite entropy, F{x) = G{x) = x 
and R(x,x') = ax for some a > 1, the obvious differential-entropy analog of Lemma 2.1 yields 
logo < 0. 

On the other hand, the simple lower bound in (5) does generalize, 

h(X + Y) > mzx{h(X),h(Y)}, (6) 
and is equivalent to the data processing inequality, 

mm{I(X + Y; X), I(X + Y; Y)} > 0, 

since, 

< I(X + Y; X) = h(X + Y) - h(X + Y\X) = h(X + Y) - h(Y\X) = h(X + Y) - h(Y), 

and similarly for h(X) in place of h(Y); here we use the fact that differential entropy is 
translation-invariant. 

In the rest of the paper, all standard properties of h(X) and H{X) will be used without 
explicit reference; they can all be found, e.g., in [3]. Since it will play a particularly central role 
in what follows, we recall that the mutual information /(X; Y) between two arbitrary continuous 
random vectors X, Y can be defined as, 

/(X;Y) = h(X) - h(X\Y) = h(Y) - h(Y\X) = h(X) + h(Y) — h(X,Y), 

and the data processing inequality states that, whenever X and Z are conditionally independent 
given Y, we have, 

J(X;Y) >/(X,Z). 

The development in Section 3 will be largely based on the idea that the use of functional sub- 
modularity can be avoided by reducing the inequalities of interest to data-processing inequalities 
for appropriately defined mutual informations. This reduction is sometimes straightforward, but 
sometimes far from obvious. 
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3 Sumset Bounds for Differential Entropy 



Throughout the rest of the paper, unless explicitly stated otherwise, all random variables are 
assumed to be real- valued and continuous (i.e., with distributions absolutely continuous with 
respect to Lebesgue measure, or in other words, having a probability density function), and the 
differential entropy of any random variable or random vector appearing in the statement of any 
of our results is assumed to exist and be finite. 

3.1 Ruzsa distance and the doubling and difference constants 

In analogy with the corresponding definition for discrete random variables [16], we define the 
Ruzsa distance between any two continuous random variables X and Y as, 

dist fl (X, Y) = h(X' - Y') - h(X') - h(Y>), 

where X' ~ X and Y' ~ Y are independent. It is obvious that dist^ is symmetric, and it is 
nonnegative because of the lower bound in (6). Our first result states that it also satisfies the 
triangle inequality: 

Theorem 3.1 (Ruzsa triangle inequality for differential entropy) If X,Y,Z are inde- 
pendent, then: 

h{X - Z) < h(X — Y) + h(Y - Z) - h(Y). 
Equivalently, for arbitrary random variables X,Y,Z: 

dist R (X, Z) < disttf (X, Y) + dist R (Y, Z). 

The proof of the discrete version of this result in [16] is based on the discrete entropy analog 
of the bound, 

h(X, Y, Z) + h(X - Z)< h(X -Y,Y- Z) + h(X, Z), (7) 

which is proved using the functional submodularity Lemma 2.1. Although in general Lemma 2.1 
fails for differential entropy, we may try to adapt its proof in this particular setting. However 
the obvious modification of the discrete proof in the continuous case also fails; the analog of the 
first inequality in the proof of Lemma 2.1, corresponding to H{X\2j < H(X\,X2), is, 

h{X, Y, Z) < h(X — Y,Y — Z, X, Z), 

which is false, since (X — Y,Y — Z, X, Z) is concentrated on a lower-dimensional subspace of R 4 , 
and so the term on the right side is — oo. Nevertheless, the actual inequality (7) does hold true. 

Lemma 3.2 The inequality (7) holds true for any three independent random variables X, Y, Z, 
and it is equivalent to the following data processing inequality: 

I(X; (X — Y,Y — Z)) > I{X- X - Z). (8) 



7 



Proof. Inequality (8) is an immediate consequence of data processing, since X — Z = {X — 
Y) + (Y — Z), therefore, X and X — Z are conditionally independent given {X — Y,Y — Z). To 
see that it is equivalent to (7), note that the right-hand side of (8) is, 

h(X - Z) - h(X - Z\X) = h(X - Z) - h(Z), 

while the left-hand side is, 

h{X - Y, Y - Z) + h(X) - h(X, X - Y, Y - Z) = h(X — Y,Y — Z) + h(X) — h(X, Y, Z) 

= h(X — Y,Y — Z) — h(Y) — h(Z), 

where the first equality above follows from the fact that the linear map, (x,x—y, y—z) h-> (x, y, z) 
has determinant 1. □ 

Proof of Theorem 3.1. Rearranging (7) and using independence, 

h(X - Z) < h(X -Y,Y-Z)- h(Y) < h(X -Y) + h(Y - Z) - h(Y). 

This is easily seen to be the same as the claimed inequality upon substituting the definition of 
the Ruzsa distances in terms of differential entropies. □ 

Replacing Y by —Y, the triangle inequality yields: 
Lemma 3.3 IfX,Y,Z are independent, then: 

h{X -Z) + h{Y) < h(X + Y) + h{Y + Z). 
In a similar vein we also have: 
Lemma 3.4 If X, Y, Z are independent, then, 

h{X + Y + Z) + h{Y) < h{X + Y) + h(Y + Z), 
which is equivalent to the data processing inequality, 

I(X + Y + Z;X)<I(X + Y;X). (9) 

Proof. The equivalence of the two stated inequalities follows from the observation that 

I(X + Y + Z;X) = h(X + Y + Z)- h(X + Y + Z\X) 
= h(X + Y + Z)-h(Y + Z\X) 
= h{X + Y + Z)- h(Y + Z), 

and similarly, 

I(X + Y; X) = h(X + Y) - h(X + Y\X) = h(X + Y) - h(Y\X) = h(X + Y) - h(Y). 
By the data processing inequality for mutual information, and the assumed independence, 
I{X + Y + Z;X) < I{X + Y, Z; X) = I(X + Y;X) + I{Z; X\X + Y) = I{X + Y; X) 
which proves (9) and hence the lemma. □ 
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Combining the last two lemmas, yields: 

Theorem 3.5 (Doubling-difference inequality) If X\,X 2 are i.i.d., then: 

1 h(X 1 +X 2 )-h(X 1 ) 

2 " h{X 1 - X 2 ) - h(X x ) ~ ' 

Equivalently: 

1 J(X 1+ X 2 ;X 2 ) 

2 " I(Xi-* 2 ;X 2 ) " ' 

Proof. For the upper bound, from Lemma 3.4 taking X, — Y" and Z i.i.d., we have, 
h(X + Z) + h(Y) <h(X + Y + Z) + h(Y) < h(X + Y) + h(Z + Y), 

so that, 

h(X + Z) + h(X) < 2h(X - Z), 

or, 

h(X + Z)- h(X) < 2[h(X - Z) - h(X)], 
as required. For the lower bound, Lemma 3.3 with X, Y, Z i.i.d. yields, 

h(X - Y) + h(X) < 2h(X + Y), 

i.e., 

h(X -Y)- h{X) < 2[h(X + Y) - h(X)}, 

which is the stated lower bound. The fact that the entropy bounds are equivalent to the 
corresponding mutual information bounds can be established easily as in the first part of the 
proof of Lemma 3.4. □ 

Theorem 3.5 examines a basic constraint that the differential entropies of the sum and 
difference of two i.i.d. random variables place on each other. These quantities are, of course, 
identical when the density under consideration is symmetric, but there does not seem to be an 
immediate intuitive reason for them to be mutually constraining in the case when the difference 
X\ — X 2 has a symmetric density but the sum X\ + X 2 does not. Indeed, Lapidoth and Pete 
[8] showed that the entropies of the sum and difference of two i.i.d. random variables can differ 
by an arbitrarily large amount: Given any M > 0, there exist i.i.d. X\,X 2 of finite differential 
entropy, such that, 

h(Xt + X 2 ) - h(X 1 - X 2 ) > M. (10) 

If we consider the "entropy-increase" due to addition of subtraction, 

A+ = h(Y + Y')-h(Y) 
A_ = h(Y - Y') - h{Y); 

then (10) states that the difference A + — A_ can be arbitrarily large, while Theorem 3.5 asserts 
that the ratio A + /A_ must always lie between ^ and 2. 

In other words, we define the doubling constant and the difference constant of a random 
variable X as, 

a[X] = exp{/i(X + X') - h(X)} and S[X] = exp{h(X - X') - h(X)}, 
respectively, where X' is an independent copy of X, then Theorem 3.5 says that: 
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Corollary 3.6 For any random variable X, 

idist fl (X,X) < tist R (X,-X) < 2dist«(X,X), 

equivalently, 

5[X] 1 / 2 < a[X] < 5[X] 2 . 

Note. As mentioned on pp. 64-65 of [17], the analog of the above upper bound, cr[X] < 5[X] 2 , in 
additive combinatorics is established via an application of the Pliinnecke-Ruzsa inequalities. It 
is interesting to note that the entropy version of this result (both in the discrete and continuous 
case) can be deduced directly from elementary arguments. Perhaps this is less surprising in view 
of the fact that strong versions of the Pliinnecke-Ruzsa inequality can also be established by 
elementary methods in the entropy setting, and also because of the (surprising and very recent) 
work of Petridis [12], where an elementary proof of the Pliinnecke-Ruzsa inequality for sumsets 
is given. See Sections 3.2 and 3.4, and the discussion in [15, 11]. 

We now come to the first result whose proof in the continuous case is necessarily significantly 
different than its discrete counterpart. 

Theorem 3.7 (Sum-difference inequality for differential entropy) For any two indepen- 
dent random variables X,Y: 

h(X + Y) <3h(X -Y) - h(X) - h(Y). (11) 

Equivalently, for any pair of random variables X, Y , 

dist fl (X, —Y) < 3dist fi (X, Y). (12) 

The equivalence of (12) and (11) follows simply from the definition of the Ruzsa distance. 
Before giving the proof, we state and prove the following simple version of the theorem in terms 
of mutual information: 

Corollary 3.8 (Sum-difference inequality for information) For any pair of independent 
random variables X, Y, and all < a < 1: 

aI(X + Y;X) + (1- a)I{X + Y; Y) < (1 + a)I{X - Y; X) + (1 + (1 - a))I{X — Y;Y). 

Proof. Subtracting h(X) from both sides of (11) yields 

h(X + Y)- h(X) < 3h(X — Y) — 2h(X) - h(Y), 

or equivalently, 

h(X + Y)- h(X + Y\Y) < 2[h(X — Y) — h(X - Y\Y)] + [h(X — Y) — h(X - Y\X% 

which, in terms of mutual information becomes, 

I(X + Y; Y) < 21 {X — Y;Y) + I{X - Y; X). (13) 

Repeating the same argument, this time subtracting h(Y) instead of h(X) from both sides, 
gives, 

I(X +Y;X) < 2I(X -Y;X) + I(X - Y; Y). (14) 

Multiplying (13) by a, (14) by (1 — a), and adding the two inequalities gives the stated result. 

□ 
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The inequality (11) of Theorem 3.7 is an immediate consequence of the following proposition. 

Proposition 3.9 Suppose X, Y are independent, let Z = X—Y, and let (X\,Yi) and (X2, Y 2 ) be 
two conditionally independent versions of (X,Y) given Z. If (X 3 ,Y 3 ) ~ (X, Y) are independent 
of(X 1 ,Y 1 ,X 2 ,Y 2 ), then: 

h(x 3 + y 3 ) + Kx x ) + h(Y 2 ) < h(x 3 - y 2 ) + h(Xi - y 3 ) + h(Z). (15) 

The proof of the discrete analog of the bound (15) in [16] contains two important steps, 
both of which fail for differential entropy. First, functional submodularity is used to deduce the 
discrete version of, 

h(X u X 2 , X 3 , Yi, Y 2 , Y 3 ) + h(X 3 + Y 3 ) < h(X 3 ,Y 3 ) + h{X 3 - Y 2 ,X 1 - Y 3 , X 2 , Yx), (16) 

but (16) is trivial because the first term above is equal to —00. Second, the following simple 
mutual information identity (implicit in [16]) fails: If Z = F(X) and X, X' are conditionally 
independent versions of X given Z, then I(X;X') = H(Z). Instead, for continuous random 
variables, Z and X are conditionally independent given X' , and hence, 

I{X; X') > I(X; Z) = h(Z) - h{Z\X) = +00. 

Instead of this, we will use: 

Lemma 3.10 Under the assumptions of Proposition 3.9: 

h(Z, Y l5 Y 2 ) + h(Z) - h{Yi) - h(Y 2 ) = h(Xi) + h(X 2 ). 

Proof. Expanding and using elementary properties, 

h(Z,Y 1 ,Y 2 ) + h(Z)-h(Y 1 )-h(Y 2 ) = h(Y 1 ,Y 2 \Z) + 2h(Z)-h(Y 1 )-h(Y 2 ) 

= h(Yi\Z) + h(Y 2 \Z) + 2h(Z) - hiXx) - h(Y 2 ) 

= h(Y u Z) + h(Y 2 ,Z)-h(Y 1 )-h(Y 2 ) 

= h{Z\Y x ) + h{Z\Y 2 ) 

= h(X l - Yi|Yi) + h(X 2 - Y 2 \Y 2 ) 

= h(X 1 ) + h(X 2 ), 

as claimed. □ 

Proof of Proposition 3.9. The most important step of the proof is the realization that the 
(trivial) result (16) needs to be replaced by the following: 

h(Z,X 3 ,Y u Y 2 ,Y 3 ) + h(X 3 + Y 3 ) < h(X 3 , Y 3 ) + h(X 3 - Y 2 ,X X - Y 3 ,X 2 , Yi). (17) 

Before establishing (17) we note that it implies, 

h(X 3 + Y 3 ) < h(X 3 - Y 2 ) + h(X! - Y 3 ) + h(X 2 ) + h{Y{) - h(Z, Y u Y 2 ), 

using the independence of (X 3 ,Y 3 ) and (Yi,Y 2 , Z). Combined with Lemma 3.10, this gives the 
required result. 
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To establish (17) we first note that, by construction, X\ — Y\ = X 2 — Y2 = Z, therefore, 

X 3 + Y 3 = X 3 + Y 3 + (X 2 -Y 2 )-{X 1 -Y 1 ) 
= (X 3 -Y 2 )-(X 1 -Y 3 ) + X 2 + Y 1 , 

and hence, by data processing for mutual information, 

I(X 3 ; X 3 + Y 3 ) < I(X 3 ; X 3 - Y 2 , X l - Y 3 ,X 2 , Yx), 

or, equivalently, 

h(X 3 + Y 3 )-h(Y 3 ) = h(X 3 + Y 3 )-h(X 3 + Y 3 \X 3 ) 

< h(X 3 ) + h(X 3 - Y 2 ,X X - Y 3 ,X 2 , Y x ) - h(X 3 - Y 2 ,X X - Y 3 , X 2 ,Yx,X 3 ) 
= h(X 3 ) + h(X 3 - Y 2 ,Xi - Y 3 ,X 2 , Yy) - h(Z, Yx,Y 2 ,Y 3 , X 3 ), 

where the last equality follows from the fact that the linear map, (z,yx,y 2 ,y 3 ,x 3 ) 1— > (x 3 — 
2/2 1 Hi + z — y 3 , y 2 + z, y±, x 3 ), has determinant 1. Rearranging and using the independence of X 3 
and Y 3 gives (17) and completes the proof. □ 



3.2 The differential entropy Pliinnecke-Ruzsa inequality 

In additive combinatorics, the Pliinnecke-Ruzsa inequality for iterated sumsets is a subtle result 
that was originally established through an involved proof based on the theory of commutative 
directed graphs; see Chapter 6 of [17]. It is interesting that its entropy version can be proved as 
a simple consequence of the data processing bound in Lemma 3.4. See also the remark following 
Corollary 3.6 above. 

Theorem 3.11 (Pliinnecke-Ruzsa inequality for differential entropy) Suppose that the 
random variables X,Y±, Y 2 , . . . ,Y n are independent, and that, for each i, Yi is only weakly 
dependent on (X + Yi), in that I(X + Yf, Yi) < log Ki for finite constants K\,K 2 , . . . , K n . In 
other words, 

h(X + Yi) < h(X) + log Ki, for each i. 

Then, 

h(X + Y x + Y 2 + • • • + Y n ) < h(X) + logJCiJCa 

or, equivalently, 

I(X + Yy + Y 2 + ■ ■ ■ + Y n ; Y x + Y 2 + ■ ■ ■ + Y n ) < log K X K 2 ■ ■ ■ K n . 

Proof. Using Lemma 3.4: 

h(X + Yx + Y 2 + • • • + Y n ) < h(X + Yx + Y 2 + • • • + Y n - X ) + h(X + Y n ) - h(X) 

< h(X + Yx + Y 2 + ---+Y n -x) + logK n , 

and continuing inductively yields the result. □ 
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By an application of the entropy Pliinnecke-Ruzsa inequality we can establish the following 
bound on iterated sums. 

Theorem 3.12 (Iterated sum bound) Suppose X,Y are independent random variables, let 
(Xq, Yo), (Xi,Y\), . . . , (X n , Y n ) be i.i.d. copies of (X, Y), and write Si = Xi + Yi for the sums of 
the pairs, i = 0, 1, . . . , n. Then: 

h{S + Si + • • • + S n ) < (2n + l)h(X + Y) - nh(X) - nh(Y). 

Proof. Suppose the result is true for n = 1. Then, for each i, 

h(S + St) < 3h(X + Y) - h(X) - h(Y) = h(S ) + [2h(X + Y) - h(Y) - h(Y)}, 

and the general claim follows from an application of the entropy Pliinnecke-Ruzsa inequality 
(Theorem 3.11). The case n = 1 is an immediate consequence of the following lemma (which 
generalizes Lemma 3.4) with X ~ Z and Y ~ W. □ 



Lemma 3.13 If X,Y, Z,W are independent, then: 

h(X + Y + Z + W) + h(Y) + h(Z) < h(X + Y) + h(Y + Z) + h(Z + W). 
Proof. Applying Lemma 3.4 with Z + W in place of Z, 

h(X + Y + Z + W) + h{Y) < h{X + Y) + h(Y + Z + W), 
and using Lemma 3.4 again on the last term above, 

h(X + Y + Z + W) + h(Y) < h(X + Y) + h(Y + Z) + h(Z + W)- h(Z). 

Rearranging, proves the claim. □ 
Let us briefly comment on the interpretation of Theorem 3.12. The result may be rewritten 

as 

(n n \ 

Y, X * + zZ Yi ) >n[h(X)+h(Y)-h(X + Y)\, 
i=i i=i ' 

and hence as 

(n n \ 

J^Xi + Y,^) >n[h{X,Y)-h(X + Y)]. (18) 
i=i i=i ' 

Thus the "differential entropy loss from summation" of the collection of n + 1 independent 
random variables {Xi + Yi : i = 1, . . . , n} is at least n times the "differential entropy loss from 
summation" of the two independent random variables {X, Y}. (In the discrete case, one would 
have a stronger interpretation as the entropy loss would be precisely the information lost in 
addition.) 
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3.3 The differential entropy Balog-Szemeredi-Gowers lemma 

The differential entropy version of the Balog-Szemeredi-Gowers lemma stated next says that, 
if X, Y are weakly dependent and X + Y has small entropy, then there exist conditionally 
independent versions of X, Y that have almost the same entropy, and whose independent sum 
still has small entropy. 

Theorem 3.14 (Balog-Szemeredi-Gowers lemma for differential entropy) Suppose X , 
Y are weakly dependent in the sense that I(X;Y) < logK, i.e., 

h(X, Y) > h(X) + h(Y) - log K, (19) 

for some K > 1, and suppose also that, 

h{X + Y)<hi{X)+ l -h{Y) + log K. (20) 

Let X\,X 2 be conditionally independent versions of X given Y, and let Y' be a conditionally 
independent version ofY, given X2 and Y; in other words, the sequence X 2 ,Y, X±,Y' forms a 
Markov chain. Then: 

h(X 2 \X 1 ,Y) > h(X) - \ogK (21) 
h(Y'\Xi,Y) > h(Y) - log K (22) 

h(X 2 + Y'\X 1 ,Y) < h l (X) + \{Y) + l\ogK. (23) 

Following the corresponding development in [16] for discrete random variables, first we es- 
tablish a weaker result in the following proposition. The main step in the proof - which is 
also a very significant difference from the proof of the discrete version of the result in [16] - is 
the identification of the "correct" data processing bound (26) that needs to replace the use of 
functional submodularity. 

Proposition 3.15 (Weak Balog-Szemeredi-Gowers lemma) Under the assumptions of 
Theorem 3.14, we have: 

h(Xi-X 2 \Y) < h(X)+A\ogK. 

Proof. Let X\ and X 2 be conditionally independent as above, and let (X\,Y, X 2 ) and 
(Xi,Y" , X 2 ) be conditionally independent versions of (X\,Y, X 2 ), given (Xi,X 2 ). We claim 
that, 

h(X 1 ,X 2 , Y, Y") + h(X 1 -X 2 ,Y)< h(X 1 ,X 2 , Y) + h(X 1 + Y" , X 2 + Y", Y), (24) 
which is equivalent to, 

h(X 1 ,X 2 , Y"\Y) + h{X l - X 2 \Y) < h(X 1 ,X 2 \Y) + h(X 1 + Y" , X 2 + Y"| Y). (25) 
This follows from the data processing argument: 
h(X 1 -X 2 \Y) 

= I(X 1 -X 2 ;X 1 \Y) + h(X 1 - X 2 \Xi,Y) 

< I(X l + Y", X 2 + y"; X X \Y) + h(X 2 \Y) (26) 
= h(X 1 + Y",X 2 + Y"|Y) + /i(Xi|Y) - h(X l + Y",X 2 + Y",X X \Y) + h(X 2 \Y) 
= h(X u X 2 \Y) + h(X 1 + Y", X 2 + Y"|Y) - h(X 1 + Y", X 2 + Y" , X ± \Y) 
= h(X u X 2 \Y) + h(X l + Y", X 2 + Y"\Y) - h(X 1 ,X 2 , Y"\Y), 
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where the last equality follows from the fact that the linear map (x±,x 2 ,y) (x\ + y, X2 + y, x±) 
has determinant —1. This establishes (25) and hence (24). 

We now deduce the result from (24). By the independence bound for joint entropy, the 
second term in the right-hand side of (24) is, 

h(Xt + Y", X 2 + Y" , Y) < 2h(X + Y) + h(Y). 

By conditional independence and the chain rule, the first term in the right-hand side of (24) is, 

h(X u X 2 ,Y) = h(X 1 ,X 2 \Y) + h(Y) = hiXtlY) + h(X 2 \Y) + h(Y) = 2h(X,Y) - h(Y). 

Using this, conditional independence, and the independence bound for joint entropy, the first 
term in the left-hand of (24) is, 

h(X 1 ,X 2 ,Y,Y") = h(X 1 ,X 2 ) + h(Y,Y"\X 1 ,X 2 ) 

= h(X u X 2 ) + h(Y\X u X 2 ) + h(Y"\X u X 2 ) 

= 2h(X 1 ,X 2 ,Y)-h(X l ,X 2 ) 

= 4h(X, Y) - 2h(Y) - h(X 1 ,X 2 ) 

> Ah{X, Y) - 2h(Y) - 2h(X). 

And by the chain rule, the second term in the left-hand side of (24) is, 

h(X 1 -X 2 ,Y) = h{X l - X 2 \Y) + h{Y). 

Finally combining all the above estimates yields, 

h{X x - X 2 \Y) + h{Y) + ih(X, Y) - 2h(Y) - 2h(X) < 2h(X + Y) + h(Y) + 2h(X, Y) - h(Y), 
or, 

h(X! - X 2 \Y) < 2h(X + Y) + h(Y) - 2h{X, Y) + 2h(X), 

and the claim then follows from the assumptions in (19), (20). □ 

The proof of Theorem 3.14, given next, is similar. Again, the key step is an application of 
the data processing inequality in (29). 

Proof of Theorem 3.14- The bound (21) immediately follows from (19) and the definitions: 
h{X 2 \Xi,Y) = h(X 2 \Y) = h(X\Y) = h(X,Y) - h(Y) > h(X) - log if. 

Similarly, (22) follows from (19): 

h(Y'\Xi,Y) = h{Y'\Xi) = h(Y\X) = h(X,Y) - h(X) > h(Y) — log K. 

For (23), we first claim that the following holds, 

h(X u X 2 ,Y, Y') + h(X 2 + Y\ Y) < h(X 2 ,Y', Y) + h{X l -X 2 ,X 1 + Y 1 , Y), (27) 

or, equivalently, 

h(X 1 ,X 2 ,Y'\Y) + h(X 2 + Y'\Y) < h(X 2 ,Y'\Y) + h(X 1 - X 2 ,X 1 + Y'\Y). (28) 
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As in the previous proof, this follows from a data-processing argument, 

h{X 2 + Y'\Y) = I(X 2 + Y';X 2 \Y) + h(X 2 + Y'\X 2 ,Y) 

< I(X 1 -X 2 ,X 1 +Y';X 2 \Y) + h(Y'\X 2 ,Y) (29) 

= h(X 1 -X 2 ,X 1 +Y'\Y) + h(X 2 \Y)-h(Xi -X 2 ,X 1 +Y',X 2 \Y) + h(Y'\Y) 

= h(X 1 - X 2i X x + Y'\Y) - h(X 1 ,X 2 ,Y'\Y) + h(X 2 , Y'\Y), 

where the last equality follows from the fact that X 2 and Y' are conditionally independent 
given Y, and also from the fact that the linear map (xi,x 2 ,y) i-> (xi — x 2 ,x\ + y,x 2 ) has 
determinant —1. This proves (28) and hence (27). 

As in the previous proof, we bound each of the four terms in (27) as follows. By the chain 
rule and conditional independence, the first term is, 

h{X u X 2 , Y, Y') = h(X u X 2 ,Y) + h(Y'\X u X 2 ,Y) 
= h(X 1 ,X 2 ,Y) + h(Y'\X l ) 
= h(Y) + 2h{X\Y) + h(X,Y) - h{X) 
= 3h(X,Y)-h(X)-h(Y). 

By the chain rule the second term is, h(X 2 + Y',Y) = h(X 2 + Y'\Y) + h(Y), and by the 
independence entropy bound for the third term we have, 

h(X 2 ,Y', Y) < h(X 2 , Y) + h(Y') = h(X, Y) + h(Y). 

Finally by the chain rule and the fact that conditioning reduces entropy, 

h{X l - X 2 ,X X + Y', Y) < h(Y) + h(X 1 - X 2 \Y) + h(X 1 + Y') 

= h(X 1 - X 2 \Y) + h(Y) + h(X + Y). 

Substituting these into (27) gives, 

h(X 2 + Y'\Y) < h(X 1 - X 2 \Y) + 2h(Y) + h(X) + h(X + Y) - 2h(X, Y), 

and applying the weak Balog-Szemeredi-Gowers lemma of Proposition 3.15 together with (19) 
and (20), yields, 

h(X 2 + Y'\X 1 ,Y) = h(X 2 + Y'\Y) < h(X) + h(Y) + 7logK, 

as claimed. □ 

Let us comment some more on the interpretation of Theorem 3.14. The conditions assumed 
may be rewritten as follows: 

1. The variables (X 2 ,Y, X%,Y') form a Markov chain, with the marginal distributions of the 
pairs (X 2 ,Y), (X\,Y) and (X\,Y') all being the same as the distribution of (X,Y). 

2. I(X;Y) < logK, i.e., if we represent the Markov chain (X 2 ,Y, Xi,Y') on a graph, then 
the information shared across each edge is the same (by 1.) and it is bounded by logK. 

3. I(X + Y; X) + I(X + Y ; Y) < 2 log K. 
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The first 2 parts of the conclusion may be rewritten as: 

1. I(X 2 ;X 1 ,Y)<logK; 

2. I(Y';X u Y)<]ogK. 

These are obvious from looking at the graph structure of the dependence. To rewrite the third 
part of the conclusion, note that, 

h{X) + h{Y) - 2h(X 2 + Y'\X 1 ,Y) = [h(X 2 ) - h(X 2 \X 1 ,Y)\ + [h(Y') - fc(Y'|Xi,y)] 

+ [h(X 2 \X u Y) - h(X 2 + Y'lXx, Y)] 
+ [h(Y'\Xi,Y) - h(X 2 + Y'\X U Y)} 
= I(X 2 ;X 1 ,Y)+I(Y';X 1 ,Y) 

+ I(X 2 + Y'; Y' | X 1 , Y) + I(X 2 + Y' ; X 2 \ X 1 , Y) , 

so that using the first 2 parts of the conclusion, the third part says that 

I(X 2 + Y';Y'\X 1 ,Y) + I(X 2 + Y';X 2 \X 1 ,Y) < 16 log K. 

This is not the same as saying that the boundedness of I(X + Y; X) + I(X + Y; Y) for the 
dependent pair (X, Y) translates to boundedness of the corresponding quantity for independent 
X and Y with the same marginals (since conditioning will change the marginal distributions), 
but it does mean that if we embed the dependent pair (X±, Y) into a Markov chain that has X 2 
and Y' at the ends, one has boundedness on average of the corresponding conditional quantity 
for the pair (X 2 ,Y ! ) (which is conditionally independent given X\ and Y). 

3.4 Sumset bounds for discrete entropy 

Here we give a brief discussion of the discrete versions of the results presented so far in this 
section, their origin and the corresponding discrete proofs. 

The discrete version of the Ruzsa triangle inequality as in Theorem 3.1 was given in [14] 
and [16]. The analog of Lemma 3.2 for discrete random variables was established in [16], and of 
Lemma 3.4 in [11]. The discrete entropy version of the lower bound in the doubling- difference 
inequality of Theorem 3.5 is implicit in [1 i] and [16], and the corresponding upper bound is 
implicitly derived in [11]. The discrete version of the sum-difference inequality of Theorem 3.7 
is proved in [16]; the form given in Corollary 3.8 in terms of mutual information is new even in 
the discrete CcLSG, clS IS Lemma 3.10. 

The discrete analog of Proposition 3.9 is implicit in [16]. The Pliinnecke- Ruzsa inequality 
(Theorem 3.11) for discrete random variables in implicitly proved in [7], and explicitly stated 
and discussed in [15]. The iterated sum bound of Theorem 3.12 in the discrete case is implicit 
in [16], while the discrete versions of the strong and weak forms of the Balog-Szemeredi-Gowers 
lemma (Theorem 3.14 and Proposition 3.15) are both given in [16]. 

Finally, in the unpublished notes of Tao and Vu [18], the following is stated as an exercise: 

Proposition 3.16 (Ruzsa covering lemma for Shannon entropy) Suppose X,Y are in- 
dependent discrete random variables, and let (Xi,Y±), (X 2 ,Y 2 ) be versions of (X, Y) that are 
conditionally independent given X + Y. Then: 

H(X 1 ,X 2 ,Y 1 \Y 2 ) = 2H(X) + H(Y) — H(X + Y). (30) 
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We give a proof below for the sake of completeness, but first we note that the result actually 
fails for differential entropy: By construction we have that Y 2 = X\ + Y\ — X 2 , therefore, the 
left-hand side of the continuous analog of (30) is, 

h(X 1 ,X 2 ,Y 1 \Y 2 ) = h(X 1 ,X 2 ,Y 2 \Y 2 ) = -00. 

Proof. Since, by definition, X\ + Y\ = X 2 + Y 2 , the triplet (Xi,X 2 ,X\ + Y{) determines all 
four random variables. Therefore, by data processing for the discrete entropy and elementary 
properties, we have, 

H(X 1 ,X 2 ,Y 1 \Y 2 ) = H(X ll X 2 ,Y 1 ,Y 2 )- H(Y 2 ) 

= H(X 1 ,X 2 ,X 1 +Y 1 )-H(Y) 

= H(X 1 + Y 1 ) + H(X 1 ,X 2 \X 1 + Y 1 )-H(Y) 

= H(X + Y) + + Y x ) + H(X 2 \X 1 + Y x ) - H{Y) 

= 2H(X,X + Y)-H(X + Y)-H(Y) 

= 2H(X,Y)-H(X + Y)-H(Y) 

= 2H(X) + H(Y) - H(X + Y), 

as claimed. □ 
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4 A Differential Entropy Inverse Sumset Theorem 



The inverse sumset theorem of Preiman-Green-Ruzsa states that, if a discrete set is such that 
the cardinality of the sumset A + A is close to the cardinality of A itself, then A is "as structured 
as possible" in that it is close to a generalized arithmetic progression; see [5] or [17] for details. 
Roughly speaking, the discrete entropy version of this result established by Tao [16] says that, 
if X, X' are i.i.d. copies of a discrete random variable and H(X + X') is not much larger than 
H(X), then the distribution of X is close to the uniform distribution on a generalized arithmetic 
progression. In other words, if the doubling constant cr[X] = exp{H(X + X') — H(X)} is small, 
then X is close to having a maximum entropy distribution. 

Here we give a quantitative version of this result for continuous random variables. First we 
note that the entropy power inequality [3] for i.i.d. summands states that, 

e 2h(x+x>) > 2e 2h{x \ 

or, equivalently, recalling the definition of the doubling constant from Section 3.1, 

a[X] := exp{/i(X' + X) - h(X)} > V2, 

with equality iff X is Gaussian. Note that, again, the extreme case where h(X + X') is as close 
as possible to h(X) is attained by the distribution which has maximum entropy on M, subject 
to a variance constraint. 

Next we give conditions under which the doubling constant cr[X] of a continuous random 
variable is small only if the distribution of X is appropriately close to being Gaussian. Recall 
that the Poincare constant R{X) of a continuous random variable X is defined as, 

R(X)= sup 

where the supremum is over all functions g in the space H\ (X) of absolutely continuous functions 
with E[g(X)] = and < Vax(g(X)) < oo. As usual, we write D(f\\g) for the relative entropy 
f flog(f/g) between two densities / and g. 

Theorem 4.1 (Freiman-Green-Ruzsa theorem for differential entropy) Let X be an 

arbitrary continuous random variable with density f . 

(i) o~[X] > \/2, with equality iff X is Gaussian. 

(ii) If o~[X] < C and X has finite Poincare constant R = R(X), then X is approximately 
Gaussian in the sense that, 

"I? < n(fU) < fH^ + iWf— Y 



^\t<D(fU)< (^f + i) iog v 

where a 2 is the variance of X and 4> denotes the Gaussian density with the same mean 
and variance as X. 

At first sight, the assumption of a finite Poincare constant may appear unnecessarily re- 
strictive in the above result. Indeed, we conjecture that this assumption may be significantly 
relaxed. On the other hand, a related counterexample by Bobkov, Chistyakov and Gotze [1] 
suggests that there is good reason for caution. 
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Theorem 4.2 Let X be an arbitrary continuous random variable with density f and finite 
variance. Let 4> be the Gaussian density with the same mean and variance as f . 

(i) o~[X] < \/2 ex.p{D(f\\(f>)}, with equality iff X is Gaussian, 

(ii) [1] For any n > there exists a continuous random variable X with, 

a[X] > (V2- V )eMD(fU)}, (31) 

but with a distribution well separated from the Gaussian, in that, 

\\f-<f>h>C, (32) 

where C is an absolute constant independent ofn. 

Proof Theorem 4-1- Part (i) follows from the entropy power inequality, as discussed in the 
beginning of the section, and the first inequality in part (ii) is simply Pinsker's inequality [3]. 

For the main estimate, assume without loss of generality that X has zero mean, and recall 
that Theorem 1.3 of [6] says that, 



/ ~ v / \cr 2 + 2R J 

where, for any finite-variance, continuous random variable Y with density g, D(Y) denotes the 
relative entropy between g and the normal density (pY with the same mean and variance as Y. 
Since D(Y) can be expanded to, D(Y) = h^y) — h(Y), the above expression simplifies to, 

h(^±f-\ - h(X) > (— ^-^) [h(d>) - h(X)\, (33) 



v/2 / v ; ~ \2R + a 
or, 

as claimed. □ 

Proof of Theorem As in the last proof, for any finite- variance, continuous random variable 

Y with density g, write D{Y) for the relative entropy between g and the normal density </>y 
with the same mean and variance as Y, so that D(Y) = /i(</>y) — h(Y). Then, letting X, X' be 
two i.i.d. copies of X, 

d(^±£.) = h{4 ,)- h ' x + x ' 



V2 / V y/2 

= h(X)-h(X + X') + h{<j))-h{X)+\ogy/2 

The result of part (i) follows from (34) upon noting that relative entropy is always nonnegative. 
Part (ii) is a simple restatement of the counterexample in Theorem 1.1 of [ ]. Taking in their 
result e = — log(l — rj/y/i), we are guaranteed the existence of an absolute constant C and a 
random variable X such that (32) holds and D(X + X') < e. But, using (34) this translates to, 

and, rearranging, this is exactly condition (31). The fact that X can be chosen to have finite 
variance is a consequence of the remarks following Theorem 1.1 in [1]. □ 
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We close this section by noting that the two results above actually generalize to the difference 
constant, 

S[X] : = exp{/i(X - X') - h(X)}, 
for i.i.d. copies X, X' of X, in place of the doubling constant c[-X]. 

Corollary 4.3 Let X be an arbitrary continuous random variable with density f. 

(i) S[X] > \/2, with equality iff X is Gaussian. 

(ii) If S[X] < C and X has finite Poincare constant R = R(X), then X is approximately 
Gaussian in the sense that, 

\\\f-m<D(fU)<(^ + i)io g {^), 

where a 2 is the variance of X and 4> denotes the Gaussian density with the same mean 
and variance as X. 

Proof. Since the entropy power inequality [3] holds for arbitrary independent random variables, 
the proof of (i) is identical to that in Theorem 4.1, with —X' in place of X' . For (ii) we assume 
again without loss of generality that X has zero mean and recall that, from Theorem 3.5, we 
have, 

h(X + X') < 2h(X - X') - h(X). 
Combining this with the estimate (33) obtained in the proof of Theorem 4.1, yields, 

2 

2h (X - X') - 2h{X) - log ^2 > ( 2^p^2 ) U<t>) ~ KX)]. 



or, 

log 



as claimed. □ 

Corollary 4.4 Let X be an arbitrary continuous random variable with density f and finite 
variance. Let (ft be the Gaussian density with the same mean and variance as f . 

(i) S[X] < v2 exp{D(f\\<p)}, with equality iff X is Gaussian, 

(ii) For any ry > there exists a continuous random variable X with, 

5[X]>(V2- V )exp{D(fU)}, (35) 
but with a distribution well separated from the Gaussian, in that, 

||/-^||l>C, (36) 
where C is an absolute constant independent of n. 
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Proof. As in the proof of Theorem 4.2, with —X' in place of X' we have, 



< D 




h{X) - h(X - X') + h((j)) - h(X) + log y/2 

f V2eMD(fU)} \ 
g \ 6[X] /' 



giving (i). Part (ii) follows from Theorem 1.1 of [1] exactly as in the proof of the corresponding 
result in Theorem 4.2, since the distribution of the random variables in the counterexample given 
in [1] can be taken to be symmetric, so that X + X' has the same distribution as X — X'. □ 
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